Statistical-Based Abbreviation Expansion
نویسندگان
چکیده
The work presented in this paper deals with the text normalization for highly inflectional languages. This paper is focused on abbreviation expansion and likewise on numerals normalization. Our text normalization system does not use any explicit parser or part-of-speech tagger and thus it can be called lightly supervised. The standard rule-based text normalization method is compared with the proposed statistical-based one in the task of expansion of Czech abbreviations.
منابع مشابه
A System for Automatic Abbreviation Expansion
A system for automatic abbreviation expansion was developed and tested for use with an AAC device. The system blends several technologies in a process that automatically expands user generated abbreviations while additionally providing spell-checking. Using a series of heuristic rules and a statistical language model, the system combines a series of rule scores and probabilities to rank valid w...
متن کاملAn easily implemented method for abbreviation expansion for the medical domain in Japanese text. A preliminary study.
BACKGROUND One of the barriers for the effective use of computerized health-care related text is the ambiguity of abbreviations. To date, the task of disambiguating abbreviations has been treated as a classification task based on surrounding words. Application of this framework for languages that have no word boundaries requires pre-processing to segment a sentence into separate word sequences....
متن کاملAutomatic expansion of abbreviations by using context and character information
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain-specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation ex...
متن کاملVocabulary expansion through automatic abbreviation generation for Chinese voice search
Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named en...
متن کاملRePaLi Participation to CLEF eHealth IR Challenge 2014: Leveraging Term Variation
This paper describes the participation of RePaLi, a team composed with members of IRISA, LIMSI and STL, to the biomedical information retrieval challenge proposed in the framework of CLEF eHealth. For this first participation, our approach relies on a state-of-theart IR system called Indri, based on statistical language modeling, and on semantic resources. The purpose of semantic resources and ...
متن کامل